Skip to content

Comments

Fix long string doubled quote parsing — swev-id: astropy__astropy-14598#101

Open
casey-brooks wants to merge 3 commits intoagyn-sandbox:astropy__astropy-14598from
casey-brooks:fix/fits-card-double-quote-14598-casey
Open

Fix long string doubled quote parsing — swev-id: astropy__astropy-14598#101
casey-brooks wants to merge 3 commits intoagyn-sandbox:astropy__astropy-14598from
casey-brooks:fix/fits-card-double-quote-14598-casey

Conversation

@casey-brooks
Copy link

@casey-brooks casey-brooks commented Dec 18, 2025

Summary

  • Preserve doubled single quotes ("''") across CONTINUE segments when parsing long string FITS Cards
  • Defer unescaping until after full value reassembly; remove per-chunk unescape in Card._split
  • Refine continuation handling: strip ampersand before recovery and avoid pulling in formatting spaces on comment cards
  • Add regression tests for boundary lengths, embedded doubled quotes, null string cases, and comment interactions

Observed Failure

Silent data corruption on round-trip of FITS Card values containing doubled quotes near 80-col boundaries:

  • Trailing doubled quotes lose one quote at certain lengths:
65 False
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx''
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
67 False
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx''
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
68 False
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx''
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
69 False
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx''
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
  • When '' appears within a larger value, mismatches occur across a range of lengths (often around continuation boundaries):
55 67 False
56 68 False
57 69 False
58 70 False
59 71 False
60 72 False
61 73 False
62 74 False
63 75 False
65 77 False
67 79 False
68 80 False
69 81 False
  • Additional reported cases:
    • Trailing quote at end of long line lost:
      • fits.Card.fromstring(fits.Card("FOO", "x"*100 + "''", "comment").image).value → returns a value ending with a single quote.
    • Text after doubled quotes lost if there is a space in between:
      • fits.Card.fromstring(fits.Card("FOO", "x"*100 + "'' aaa", "comment").image).value → truncates after the single quote.

Reproduction Steps

Run any of the following in Python with astropy.io.fits:

from astropy.io import fits

# 1) Doubled quotes at end
for n in range(60, 70):
    card1 = fits.Card('CONFIG', "x" * n + "''")
    card2 = fits.Card.fromstring(str(card1))
    assert card1.value == card2.value

# 2) Doubled quotes inside a longer string
for n in range(50, 70):
    card1 = fits.Card('CONFIG', "x" * n + "''" + "x" * 10)
    card2 = fits.Card.fromstring(str(card1))
    assert card1.value == card2.value

# 3) Additional cases
assert fits.Card.fromstring(fits.Card("FOO", "x"*100 + "''", "comment").image).value.endswith("''")
assert fits.Card.fromstring(fits.Card("FOO", "x"*100 + "'' aaa", "comment").image).value.endswith("'' aaa")

Expected: All assertions pass.
Actual (pre-fix): Some assertions fail at lengths near continuation boundaries.

Stack Trace

No Python exception is raised; this is a silent parsing defect (logical data corruption). Therefore, no traceback is available. The failure is observable via round-trip mismatch as shown above.

Implementation Notes

  • Root cause: In astropy/io/fits/card.py, Card._split (long-image path) unescaped doubled quotes per CONTINUE chunk (replace("''", "'")), introducing stray single quotes at chunk boundaries that the final parser misinterpreted as terminators, leading to quote loss or truncation.
  • Fix: Remove per-chunk unescape; let final parse unescape after full value reassembly. Ensure continuation ampersands are stripped before whitespace recovery, and skip whitespace recovery when a comment field is present.
  • Serialization (_format_value, _format_long_image) unchanged; they already double embedded quotes correctly.

Tests

  • Added regression coverage in astropy/io/fits/tests/test_header.py:
    • Trailing doubled quotes near boundary lengths with and without comments
    • Embedded doubled quotes across boundaries within larger strings
    • Null string '' round-trip, including within long continued values
    • Comment interaction cases (no spurious & or spaces)

Fixes #71

@casey-brooks
Copy link
Author

Test & Lint Results

  • PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 LD_LIBRARY_PATH=/nix/store/wffgswxkp55xi14jy63rjsnfvl2qvmxy-gcc-14.3.0-lib/lib: python -m pytest -p pytest_astropy_header -p pytest_doctestplus /workspace/astropy/.venv/lib/python3.11/site-packages/astropy/io/fits/tests/test_header.py -k "doubled_quotes or regression_cases_with_trailing_quotes or null_string_round_trip_including_continued_value" (6 passed, 175 deselected)

Copy link

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary:\n- Thanks for tightening the continuation handling and regression tests.\n- I spotted one more edge case: when a CONTINUE card has a comment separator but no comment text, yields an empty string, so the new guards treat it as a value fragment and splice the literal into the long-string payload.\n\nPlease change the guards to check (and update the remainder block accordingly) so we only recover whitespace when the comment field is completely absent.

@casey-brooks
Copy link
Author

Test & Lint Results (update)

  • PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 LD_LIBRARY_PATH=/nix/store/wffgswxkp55xi14jy63rjsnfvl2qvmxy-gcc-14.3.0-lib/lib: python -m pytest -p pytest_astropy_header -p pytest_doctestplus /workspace/astropy/.venv/lib/python3.11/site-packages/astropy/io/fits/tests/test_header.py -k "doubled_quotes or regression_cases_with_trailing_quotes or null_string_round_trip_including_continued_value" (6 passed, 175 deselected)

Copy link

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tightening the guards. Confirmed both sites now check , so continuation whitespace recovery no longer triggers for empty-comment CONTINUE lines. Reconstructed values stay clean (no stray or spaces), and the existing regression tests still cover the edge cases.\n\nTests not rerun in this environment.

Copy link

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tightening the guards. Confirmed both if not comment sites now use comment is None, so continuation whitespace recovery no longer triggers for empty-comment CONTINUE lines. Reconstructed values stay clean (no stray & or spaces), and the existing regression tests still cover the edge cases.

Tests not rerun in this environment.

@rowan-stein rowan-stein changed the title Fix long string doubled quote parsing Fix long string doubled quote parsing — swev-id: astropy__astropy-14598 Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants